Forced Derivation Tree based Model Training to Statistical Machine Translation

نویسندگان

Nan Duan

Mu Li

Ming Zhou

چکیده

A forced derivation tree (FDT) of a sentence pair {f, e} denotes a derivation tree that can translate f into its accurate target translation e. In this paper, we present an approach that leverages structured knowledge contained in FDTs to train component models for statistical machine translation (SMT) systems. We first describe how to generate different FDTs for each sentence pair in training corpus, and then present how to infer the optimal FDTs based on their derivation and alignment qualities. As the first step in this line of research, we verify the effectiveness of our approach in a BTGbased phrasal system, and propose four FDTbased component models. Experiments are carried out on large scale English-to-Japanese and Chinese-to-English translation tasks, and significant improvements are reported on both translation quality and alignment quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Description of KYOTO EBMT System in PatentMT at NTCIR-10

This paper describes“KYOTO”EBMT system that attended PatentMT at NTCIR-10. When translating very different language pairs such as Japanese-English, it is very important to handle sentences in tree structures to overcome the difference. Many of recent studies incorporate tree structures in some parts of translation process, but not all the way from model training (parallel sentence alignment) to...

متن کامل

Rule Markov Models for Fast Tree-to-String Translation

Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use mini...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Akamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation

We describe Akamon, an open source toolkit for tree and forest-based statistical machine translation (Liu et al., 2006; Mi et al., 2008; Mi and Huang, 2008). Akamon implements all of the algorithms required for tree/forestto-string decoding using tree-to-string translation rules: multiple-thread forest-based decoding, n-gram language model integration, beamand cube-pruning, k-best hypotheses ex...

متن کامل

N-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Forced Derivation Tree based Model Training to Statistical Machine Translation

نویسندگان

چکیده

منابع مشابه

Description of KYOTO EBMT System in PatentMT at NTCIR-10

Rule Markov Models for Fast Tree-to-String Translation

A new model for persian multi-part words edition based on statistical machine translation

Akamon: An Open Source Toolkit for Tree/Forest-Based Statistical Machine Translation

N-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination

عنوان ژورنال:

اشتراک گذاری